Pesquisa | Portal Regional da BVS

1.

Editorial: Enhanced human modeling in robotics for socially-aware place navigation.

Tsintotas, Konstantinos A; Kansizoglou, Ioannis; Pastra, Katerina; Aloimonos, Yiannis; Gasteratos, Antonios; Sirakoulis, Giorgios Ch; Sandini, Giulio.

Front Robot AI ; 11: 1348022, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38495301

2.

Motor-invariants for action understanding in video.

Dessalene, Eadom; Aloimonos, Yiannis.

Phys Life Rev ; 47: 20-21, 2023 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-37677926

Assuntos

Meios de Comunicação , Modelos Genéticos , Filogenia

3.

Ajna: Generalized deep uncertainty for minimal perception on parsimonious robots.

Sanket, Nitin J; Singh, Chahat Deep; Fermüller, Cornelia; Aloimonos, Yiannis.

Sci Robot ; 8(81): eadd5139, 2023 Aug 16.

Artigo em Inglês | MEDLINE | ID: mdl-37585545

RESUMO

Robots are active agents that operate in dynamic scenarios with noisy sensors. Predictions based on these noisy sensor measurements often lead to errors and can be unreliable. To this end, roboticists have used fusion methods using multiple observations. Lately, neural networks have dominated the accuracy charts for perception-driven predictions for robotic decision-making and often lack uncertainty metrics associated with the predictions. Here, we present a mathematical formulation to obtain the heteroscedastic aleatoric uncertainty of any arbitrary distribution without prior knowledge about the data. The approach has no prior assumptions about the prediction labels and is agnostic to network architecture. Furthermore, our class of networks, Ajna, adds minimal computation and requires only a small change to the loss function while training neural networks to obtain uncertainty of predictions, enabling real-time operation even on resource-constrained robots. In addition, we study the informational cues present in the uncertainties of predicted values and their utility in the unification of common robotics problems. In particular, we present an approach to dodge dynamic obstacles, navigate through a cluttered scene, fly through unknown gaps, and segment an object pile, without computing depth but rather using the uncertainties of optical flow obtained from a monocular camera with onboard sensing and computation. We successfully evaluate and demonstrate the proposed Ajna network on four aforementioned common robotics and computer vision tasks and show comparable results to methods directly using depth. Our work demonstrates a generalized deep uncertainty method and demonstrates its utilization in robotics applications.

4.

Semi-supervised training using cooperative labeling of weakly annotated data for nodule detection in chest CT.

Maynord, Michael; Farhangi, M Mehdi; Fermüller, Cornelia; Aloimonos, Yiannis; Levine, Gary; Petrick, Nicholas; Sahiner, Berkman; Pezeshk, Aria.

Med Phys ; 50(7): 4255-4268, 2023 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-36630691

RESUMO

PURPOSE: Machine learning algorithms are best trained with large quantities of accurately annotated samples. While natural scene images can often be labeled relatively cheaply and at large scale, obtaining accurate annotations for medical images is both time consuming and expensive. In this study, we propose a cooperative labeling method that allows us to make use of weakly annotated medical imaging data for the training of a machine learning algorithm. As most clinically produced data are weakly-annotated - produced for use by humans rather than machines and lacking information machine learning depends upon - this approach allows us to incorporate a wider range of clinical data and thereby increase the training set size. METHODS: Our pseudo-labeling method consists of multiple stages. In the first stage, a previously established network is trained using a limited number of samples with high-quality expert-produced annotations. This network is used to generate annotations for a separate larger dataset that contains only weakly annotated scans. In the second stage, by cross-checking the two types of annotations against each other, we obtain higher-fidelity annotations. In the third stage, we extract training data from the weakly annotated scans, and combine it with the fully annotated data, producing a larger training dataset. We use this larger dataset to develop a computer-aided detection (CADe) system for nodule detection in chest CT. RESULTS: We evaluated the proposed approach by presenting the network with different numbers of expert-annotated scans in training and then testing the CADe using an independent expert-annotated dataset. We demonstrate that when availability of expert annotations is severely limited, the inclusion of weakly-labeled data leads to a 5% improvement in the competitive performance metric (CPM), defined as the average of sensitivities at different false-positive rates. CONCLUSIONS: Our proposed approach can effectively merge a weakly-annotated dataset with a small, well-annotated dataset for algorithm training. This approach can help enlarge limited training data by leveraging the large amount of weakly labeled data typically generated in clinical image interpretation.

Assuntos

Algoritmos , Tomografia Computadorizada por Raios X , Humanos , Aprendizado de Máquina , Aprendizado de Máquina Supervisionado , Processamento de Imagem Assistida por Computador/métodos

5.

Forecasting Action Through Contact Representations From First Person Video.

Dessalene, Eadom; Devaraj, Chinmaya; Maynord, Michael; Fermuller, Cornelia; Aloimonos, Yiannis.

IEEE Trans Pattern Anal Mach Intell ; 45(6): 6703-6714, 2023 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-33507864

RESUMO

Human actions involving hand manipulations are structured according to the making and breaking of hand-object contact, and human visual understanding of action is reliant on anticipation of contact as is demonstrated by pioneering work in cognitive science. Taking inspiration from this, we introduce representations and models centered on contact, which we then use in action prediction and anticipation. We annotate a subset of the EPIC Kitchens dataset to include time-to-contact between hands and objects, as well as segmentations of hands and objects. Using these annotations we train the Anticipation Module, a module producing Contact Anticipation Maps and Next Active Object Segmentations - novel low-level representations providing temporal and spatial characteristics of anticipated near future action. On top of the Anticipation Module we apply Egocentric Object Manipulation Graphs (Ego-OMG), a framework for action anticipation and prediction. Ego-OMG models longer term temporal semantic relations through the use of a graph modeling transitions between contact delineated action states. Use of the Anticipation Module within Ego-OMG produces state-of-the-art results, achieving 1st and 2 place on the unseen and seen test sets, respectively, of the EPIC Kitchens Action Anticipation Challenge, and achieving state-of-the-art results on the tasks of action anticipation and action prediction over EPIC Kitchens. We perform ablation studies over characteristics of the Anticipation Module to evaluate their utility.

6.

GradTac: Spatio-Temporal Gradient Based Tactile Sensing.

Ganguly, Kanishka; Mantripragada, Pavan; Parameshwara, Chethan M; Fermüller, Cornelia; Sanket, Nitin J; Aloimonos, Yiannis.

Front Robot AI ; 9: 898075, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35783023

RESUMO

Tactile sensing for robotics is achieved through a variety of mechanisms, including magnetic, optical-tactile, and conductive fluid. Currently, the fluid-based sensors have struck the right balance of anthropomorphic sizes and shapes and accuracy of tactile response measurement. However, this design is plagued by a low Signal to Noise Ratio (SNR) due to the fluid based sensing mechanism "damping" the measurement values that are hard to model. To this end, we present a spatio-temporal gradient representation on the data obtained from fluid-based tactile sensors, which is inspired from neuromorphic principles of event based sensing. We present a novel algorithm (GradTac) that converts discrete data points from spatial tactile sensors into spatio-temporal surfaces and tracks tactile contours across these surfaces. Processing the tactile data using the proposed spatio-temporal domain is robust, makes it less susceptible to the inherent noise from the fluid based sensors, and allows accurate tracking of regions of touch as compared to using the raw data. We successfully evaluate and demonstrate the efficacy of GradTac on many real-world experiments performed using the Shadow Dexterous Hand, equipped with the BioTac SP sensors. Specifically, we use it for tracking tactile input across the sensor's surface, measuring relative forces, detecting linear and rotational slip, and for edge tracking. We also release an accompanying task-agnostic dataset for the BioTac SP, which we hope will provide a resource to compare and quantify various novel approaches, and motivate further research.

7.

Topology-Aware Non-Rigid Point Cloud Registration.

Zampogiannis, Konstantinos; Fermuller, Cornelia; Aloimonos, Yiannis.

IEEE Trans Pattern Anal Mach Intell ; 43(3): 1056-1069, 2021 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-31514126

RESUMO

In this paper, we introduce a non-rigid registration pipeline for pairs of unorganized point clouds that may be topologically different. Standard warp field estimation algorithms, even under robust, discontinuity-preserving regularization, tend to produce erratic motion estimates on boundaries associated with 'close-to-open' topology changes. We overcome this limitation by exploiting backward motion: in the opposite motion direction, a 'close-to-open' event becomes 'open-to-close', which is by default handled correctly. At the core of our approach lies a general, topology-agnostic warp field estimation algorithm, similar to those employed in recently introduced dynamic reconstruction systems from RGB-D input. We improve motion estimation on boundaries associated with topology changes in an efficient post-processing phase. Based on both forward and (inverted) backward warp hypotheses, we explicitly detect regions of the deformed geometry that undergo topological changes by means of local deformation criteria and broadly classify them as 'contacts' or 'separations'. Subsequently, the two motion hypotheses are seamlessly blended on a local basis, according to the type and proximity of detected events. Our method achieves state-of-the-art motion estimation accuracy on the MPI Sintel dataset. Experiments on a custom dataset with topological event annotations demonstrate the effectiveness of our pipeline in estimating motion on event boundaries, as well as promising performance in explicit topological event detection.

8.

A bug's-eye view.

Aloimonos, Yiannis; Fermüller, Cornelia.

Sci Robot ; 5(44)2020 Jul 15.

Artigo em Inglês | MEDLINE | ID: mdl-33022608

RESUMO

An insect-scale visual sensing system indicates the return of active vision for robotics.

9.

Symbolic Representation and Learning With Hyperdimensional Computing.

Mitrokhin, Anton; Sutor, Peter; Summers-Stay, Douglas; Fermüller, Cornelia; Aloimonos, Yiannis.

Front Robot AI ; 7: 63, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-33501231

RESUMO

It has been proposed that machine learning techniques can benefit from symbolic representations and reasoning systems. We describe a method in which the two can be combined in a natural and direct way by use of hyperdimensional vectors and hyperdimensional computing. By using hashing neural networks to produce binary vector representations of images, we show how hyperdimensional vectors can be constructed such that vector-symbolic inference arises naturally out of their output. We design the Hyperdimensional Inference Layer (HIL) to facilitate this process and evaluate its performance compared to baseline hashing networks. In addition to this, we show that separate network outputs can directly be fused at the vector symbolic level within HILs to improve performance and robustness of the overall model. Furthermore, to the best of our knowledge, this is the first instance in which meaningful hyperdimensional representations of images are created on real data, while still maintaining hyperdimensionality.

10.

An Embodied Tutoring System for Literal vs. Metaphorical Concepts.

Sionti, Marietta; Schack, Thomas; Aloimonos, Yiannis.

Front Psychol ; 9: 2254, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-30546328

RESUMO

In this paper we combine motion captured data with linguistic notions (preliminary study) in a game-like tutoring system (study 1), in order to help elementary school students to better differentiate literal from metaphorical uses of motion verbs, based on embodied information. In addition to the thematic goal, we intend to improve young students' attention and spatiotemporal memory, by presenting sensorimotor data experimentally collected from thirty two participants in our motion capturing labs. Furthermore, we examine the accomplishment of tutor's goals and compare them to curriculum's approach (study 2). Sixty nine elementary school students were randomly divided in two experimental groups (game-like and traditional) and one control group, which did not undergo an intervention. All groups were tested in pre and post-tests. Even though the diagnostic pretests present a uniform picture, two way analysis of variance suggests that the experimental groups showed progress in post-tests and, more specifically, game-like group showed less wrong answers in the linguistics task and higher learning achievements compared to the other two groups. Furthermore, in the game-like condition the participants needed gradually shorter period of time to identify the avatar's actions. This finding was considered as a first indication of attentional and spatiotemporal memory's improvement, while the tutor's assistance features cultivated students' metacognitive perception.

11.

Revisiting active perception.

Bajcsy, Ruzena; Aloimonos, Yiannis; Tsotsos, John K.

Auton Robots ; 42(2): 177-196, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-31983809

RESUMO

Despite the recent successes in robotics, artificial intelligence and computer vision, a complete artificial agent necessarily must include active perception. A multitude of ideas and methods for how to accomplish this have already appeared in the past, their broader utility perhaps impeded by insufficient computational power or costly hardware. The history of these ideas, perhaps selective due to our perspectives, is presented with the goal of organizing the past literature and highlighting the seminal contributions. We argue that those contributions are as relevant today as they were decades ago and, with the state of modern computational tools, are poised to find new life in the robotic perception systems of the next decade.

12.

A Dataset for Visual Navigation with Neuromorphic Methods.

Barranco, Francisco; Fermuller, Cornelia; Aloimonos, Yiannis; Delbruck, Tobi.

Front Neurosci ; 10: 49, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-26941595

RESUMO

Standardized benchmarks in Computer Vision have greatly contributed to the advance of approaches to many problems in the field. If we want to enhance the visibility of event-driven vision and increase its impact, we will need benchmarks that allow comparison among different neuromorphic methods as well as comparison to Computer Vision conventional approaches. We present datasets to evaluate the accuracy of frame-free and frame-based approaches for tasks of visual navigation. Similar to conventional Computer Vision datasets, we provide synthetic and real scenes, with the synthetic data created with graphics packages, and the real data recorded using a mobile robotic platform carrying a dynamic and active pixel vision sensor (DAVIS) and an RGB+Depth sensor. For both datasets the cameras move with a rigid motion in a static scene, and the data includes the images, events, optic flow, 3D camera motion, and the depth of the scene, along with calibration procedures. Finally, we also provide simulated event data generated synthetically from well-known frame-based optical flow datasets.

13.

Leadership in orchestra emerges from the causal relationships of movement kinematics.

D'Ausilio, Alessandro; Badino, Leonardo; Li, Yi; Tokay, Sera; Craighero, Laila; Canto, Rosario; Aloimonos, Yiannis; Fadiga, Luciano.

PLoS One ; 7(5): e35757, 2012.

Artigo em Inglês | MEDLINE | ID: mdl-22590511

RESUMO

Non-verbal communication enables efficient transfer of information among people. In this context, classic orchestras are a remarkable instance of interaction and communication aimed at a common aesthetic goal: musicians train for years in order to acquire and share a non-linguistic framework for sensorimotor communication. To this end, we recorded violinists' and conductors' movement kinematics during execution of Mozart pieces, searching for causal relationships among musicians by using the Granger Causality method (GC). We show that the increase of conductor-to-musicians influence, together with the reduction of musician-to-musician coordination (an index of successful leadership) goes in parallel with quality of execution, as assessed by musical experts' judgments. Rigorous quantification of sensorimotor communication efficacy has always been complicated and affected by rather vague qualitative methodologies. Here we propose that the analysis of motor behavior provides a potentially interesting tool to approach the rather intangible concept of aesthetic quality of music and visual communication efficacy.

Assuntos

Gestos , Liderança , Fenômenos Biomecânicos , Humanos , Música

14.

Active visual segmentation.

Mishra, Ajay K; Aloimonos, Yiannis; Cheong, Loong-Fah; Kassim, Ashraf A.

IEEE Trans Pattern Anal Mach Intell ; 34(4): 639-53, 2012 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-22383341

RESUMO

Attention is an integral part of the human visual system and has been widely studied in the visual attention literature. The human eyes fixate at important locations in the scene, and every fixation point lies inside a particular region of arbitrary shape and size, which can either be an entire object or a part of it. Using that fixation point as an identification marker on the object, we propose a method to segment the object of interest by finding the "optimal" closed contour around the fixation point in the polar space, avoiding the perennial problem of scale in the Cartesian space. The proposed segmentation process is carried out in two separate steps: First, all visual cues are combined to generate the probabilistic boundary edge map of the scene; second, in this edge map, the "optimal" closed contour around a given fixation point is found. Having two separate steps also makes it possible to establish a simple feedback between the mid-level cue (regions) and the low-level visual cues (edges). In fact, we propose a segmentation refinement process based on such a feedback process. Finally, our experiments show the promise of the proposed method as an automatic segmentation framework for a general purpose visual system.

Assuntos

Algoritmos , Olho , Processamento de Imagem Assistida por Computador/métodos , Visão Ocular/fisiologia , Sinais (Psicologia) , Percepção de Forma , Humanos

15.

The minimalist grammar of action.

Pastra, Katerina; Aloimonos, Yiannis.

Philos Trans R Soc Lond B Biol Sci ; 367(1585): 103-17, 2012 Jan 12.

Artigo em Inglês | MEDLINE | ID: mdl-22106430

RESUMO

Language and action have been found to share a common neural basis and in particular a common 'syntax', an analogous hierarchical and compositional organization. While language structure analysis has led to the formulation of different grammatical formalisms and associated discriminative or generative computational models, the structure of action is still elusive and so are the related computational models. However, structuring action has important implications on action learning and generalization, in both human cognition research and computation. In this study, we present a biologically inspired generative grammar of action, which employs the structure-building operations and principles of Chomsky's Minimalist Programme as a reference model. In this grammar, action terminals combine hierarchically into temporal sequences of actions of increasing complexity; the actions are bound with the involved tools and affected objects and are governed by certain goals. We show, how the tool role and the affected-object role of an entity within an action drives the derivation of the action syntax in this grammar and controls recursion, merge and move, the latter being mechanisms that manifest themselves not only in human language, but in human action too.

Assuntos

Comportamento/fisiologia , Linguística , Neurônios/fisiologia , Software , Fenômenos Biomecânicos , Biologia Computacional , Gestos , Objetivos , Humanos , Aprendizagem , Movimento , Neurobiologia , Reconhecimento Visual de Modelos/fisiologia , Desempenho Psicomotor/fisiologia

16.

Image transformations and blurring.

Domke, Justin; Aloimonos, Yiannis.

IEEE Trans Pattern Anal Mach Intell ; 31(5): 811-23, 2009 May.

Artigo em Inglês | MEDLINE | ID: mdl-19299857

RESUMO

Since cameras blur the incoming light during measurement, different images of the same surface do not contain the same information about that surface. Thus, in general, corresponding points in multiple views of a scene have different image intensities. While multiple-view geometry constrains the locations of corresponding points, it does not give relationships between the signals at corresponding locations. This paper offers an elementary treatment of these relationships. We first develop the notion of "ideal" and "real" images, corresponding to, respectively, the raw incoming light and the measured signal. This framework separates the filtering and geometric aspects of imaging. We then consider how to synthesize one view of a surface from another; if the transformation between the two views is affine, it emerges that this is possible if and only if the singular values of the affine matrix are positive. Next, we consider how to combine the information in several views of a surface into a single output image. By developing a new tool called "frequency segmentation," we show how this can be done despite not knowing the blurring kernel.

Assuntos

Algoritmos , Inteligência Artificial , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Reconhecimento Automatizado de Padrão/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade

17.

Sensory grammars for sensor networks.

Aloimonos, Yiannis.

J Ambient Intell Smart Environ ; 1(1): 15-21, 2009.

Artigo em Inglês | MEDLINE | ID: mdl-21897837

RESUMO

One of the major goals of Ambient Intelligence and Smart Environments is to interpret human activity sensed by a variety of sensors. In order to develop useful technologies and a subsequent industry around smart environments, we need to proceed in a principled manner. This paper suggests that human activity can be expressed in a language. This is a special language with its own phonemes, its own morphemes (words) and its own syntax and it can be learned using machine learning techniques applied to gargantuan amounts of data collected by sensor networks. Developing such languages will create bridges between Ambient Intelligence and other disciplines. It will also provide a hierarchical structure that can lead to a successful industry.

18.

Active Segmentation.

Mishra, Ajay; Aloimonos, Yiannis.

Int J HR ; 6(3): 361-386, 2009.

Artigo em Inglês | MEDLINE | ID: mdl-20686671

RESUMO

The human visual system observes and understands a scene/image by making a series of fixations. Every fixation point lies inside a particular region of arbitrary shape and size in the scene which can either be an object or just a part of it. We define as a basic segmentation problem the task of segmenting that region containing the fixation point. Segmenting the region containing the fixation is equivalent to finding the enclosing contour- a connected set of boundary edge fragments in the edge map of the scene - around the fixation. This enclosing contour should be a depth boundary.We present here a novel algorithm that finds this bounding contour and achieves the segmentation of one object, given the fixation. The proposed segmentation framework combines monocular cues (color/intensity/texture) with stereo and/or motion, in a cue independent manner. The semantic robots of the immediate future will be able to use this algorithm to automatically find objects in any environment. The capability of automatically segmenting objects in their visual field can bring the visual processing to the next level. Our approach is different from current approaches. While existing work attempts to segment the whole scene at once into many areas, we segment only one image region, specifically the one containing the fixation point. Experiments with real imagery collected by our active robot and from the known databases 1 demonstrate the promise of the approach.

19.

Motion segmentation using occlusions.

Ogale, Abhijit S; Fermüller, Cornelia; Aloimonos, Yiannis.

IEEE Trans Pattern Anal Mach Intell ; 27(6): 988-92, 2005 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-15943429

RESUMO

We examine the key role of occlusions in finding independently moving objects instantaneously in a video obtained by a moving camera with a restricted field of view. In this problem, the image motion is caused by the combined effect of camera motion (egomotion), structure (depth), and the independent motion of scene entities. For a camera with a restricted field of view undergoing a small motion between frames, there exists, in general, a set of 3D camera motions compatible with the observed flow field even if only a small amount of noise is present, leading to ambiguous 3D motion estimates. If separable sets of solutions exist, motion-based clustering can detect one category of moving objects. Even if a single inseparable set of solutions is found, we show that occlusion information can be used to find ordinal depth, which is critical in identifying a new class of moving objects. In order to find ordinal depth, occlusions must not only be known, but they must also be filled (grouped) with optical flow from neighboring regions. We present a novel algorithm for filling occlusions and deducing ordinal depth under general circumstances. Finally, we describe another category of moving objects which is detected using cardinal comparisons between structure from motion and structure estimates from another source (e.g., stereo).

Assuntos

Algoritmos , Inteligência Artificial , Interpretação de Imagem Assistida por Computador/métodos , Movimento , Reconhecimento Automatizado de Padrão/métodos , Fotografação/métodos , Gravação em Vídeo/métodos , Aumento da Imagem/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA